training task
Model-Based Transfer Learning for Contextual Reinforcement Learning
Deep reinforcement learning (RL) is a powerful approach to complex decision-making. However, one issue that limits its practical application is its brittleness, sometimes failing to train in the presence of small changes in the environment. Motivated by the success of zero-shot transfer--where pre-trained models perform well on related tasks--we consider the problem of selecting a good set of training tasks to maximize generalization performance across a range of tasks. Given the high cost of training, it is critical to select training tasks strategically, but not well understood how to do so. We hence introduce Model-Based Transfer Learning (MBTL), which layers on top of existing RL methods to effectively solve contextual RL problems. MBTL models the generalization performance in two parts: 1) the performance set point, modeled using Gaussian processes, and 2) performance loss (generalization gap), modeled as a linear function of contextual similarity. MBTL combines these two pieces of information within a Bayesian optimization (BO) framework to strategically select training tasks. We show theoretically that the method exhibits sublinear regret in the number of training tasks and discuss conditions to further tighten regret bounds.
Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity
The wider application of end-to-end learning methods to embodied decision-making domains remains bottlenecked by their reliance on a superabundance of training data representative of the target domain.Meta-reinforcement learning (meta-RL) approaches abandon the aim of zero-shot --the goal of standard reinforcement learning (RL)--in favor of few-shot, and thus hold promise for bridging larger generalization gaps.While learning this meta-level adaptive behavior still requires substantial data, efficient environment simulators approaching real-world complexity are growing in prevalence.Even so, hand-designing sufficiently diverse and numerous simulated training tasks for these complex domains is prohibitively labor-intensive.Domain randomization (DR) and procedural generation (PG), offered as solutions to this problem, require simulators to possess carefully-defined parameters which directly translate to meaningful task diversity--a similarly prohibitive assumption.In this work, we present DIVA
_NeurIPS_2022__On_the_Effectiveness_of_Fine_tuning_Versus_Meta_reinforcement_Learning (1)
Do the main claims made in the abstract and introduction accurately reflect the paper's contributions and If you ran experiments... (a) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? Please refer to both main text and appendix for experiment details. Did you report error bars (e.g., with respect to the random seed after running experiments multiple All adaptation experiments in Procgen and RLBench are run for 3 seeds. Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal As stated in section 2, we use RTX A5000 GPUs each with 24GB memory. C2F-ARM algorithm and training framework are built based on the original author's implementation Did you mention the license of the assets?